Search CORE

218 research outputs found

Guest editorial: special section on software reverse engineering

Author: Di Penta Massimiliano
Oliveto Rocco
Robbes Romain
Publication venue
Publication date: 26/04/2016
Field of study

Crossref

Open Access Repository

Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

Author: Bavota Gabriele
Mastropaolo Antonio
Oliveto Rocco
Rosa Giovanni
Scalabrino Simone
Publication venue
Publication date: 28/03/2023
Field of study

Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to support developers in writing Dockerfiles, none of them is able to generate entire Dockerfiles from scratch given a high-level specification of the requirements of the execution environment. In this paper, we present a study in which we aim at understanding to what extent Deep Learning (DL), which has been proven successful for other coding tasks, can be used for this specific coding task. We preliminarily defined a structured natural language specification for Dockerfile requirements and a methodology that we use to automatically infer the requirements from the largest dataset of Dockerfiles currently available. We used the obtained dataset, with 670,982 instances, to train and test a Text-to-Text Transfer Transformer (T5) model, following the current state-of-the-art procedure for coding tasks, to automatically generate Dockerfiles from the structured specifications. The results of our evaluation show that T5 performs similarly to the more trivial IR-based baselines we considered. We also report the open challenges associated with the application of deep learning in the context of Dockerfile generation

arXiv.org e-Print Archive

Methodbook: Recommending Move Method Refactorings via Relational Topic Models

Author: Andrea De Lucia
Denys Poshyvanyk
Gabriele Bavota
Malcom Gethers
Rocco Oliveto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Author: Bavota Gabriele
Cooper Nathan
Mastropaolo Antonio
Oliveto Rocco
Palacio David Nader
Poshyvanyk Denys
Scalabrino Simone
Publication venue
Publication date: 01/01/2021
Field of study

Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is to first pre-train a model on a large and generic dataset using a self-supervised task ( e.g: filling masked words in sentences). Once the model is pre-trained, it is fine-tuned on smaller and specialized datasets, each one related to a specific task ( e.g: language translation, sentence classification). In this paper, we empirically investigate how the T5 model performs when pre-trained and fine-tuned to support code-related tasks. We pre-train a T5 model on a dataset composed of natural language English text and source code. Then, we fine-tune such a model by reusing datasets used in four previous works that used DL techniques to: (i) fix bugs, (ii) inject code mutants, (iii) generate assert statements, and (iv) generate code comments. We compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks. We show that our T5 model, exploiting additional data for the self-supervised pre-training phase, can achieve performance improvements over the four baselines.Comment: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021

arXiv.org e-Print Archive

Università degli Studi del Molise: IRIS

The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps

Author: Bavota Gabriele
Bernal-Cardenas Carlos Eduardo
Di Penta Massimiliano
Linares-Vasquez Mario
Oliveto Rocco
Poshyvanyk Denys
Publication venue: W&M ScholarWorks
Publication date: 01/01/2015
Field of study

The mobile apps market is one of the fastest growing areas in the information technology. In digging their market share, developers must pay attention to building robust and reliable apps. In fact, users easily get frustrated by repeated failures, crashes, and other bugs; hence, they abandon some apps in favor of their competition. In this paper we investigate how the fault-and change-proneness of APIs used by Android apps relates to their success estimated as the average rating provided by the users to those apps. First, in a study conducted on 5,848 (free) apps, we analyzed how the ratings that an app had received correlated with the fault-and change-proneness of the APIs such app relied upon. After that, we surveyed 45 professional Android developers to assess (i) to what extent developers experienced problems when using APIs, and (ii) how much they felt these problems could be the cause for unfavorable user ratings. The results of our studies indicate that apps having high user ratings use APIs that are less fault-and change-prone than the APIs used by low rated apps. Also, most of the interviewed Android developers observed, in their development experience, a direct relationship between problems experienced with the adopted APIs and the users\u27 ratings that their apps received

CiteSeerX

Crossref

College of William & Mary: W&M Publish

An empirical study on the relation between identifiers and fault proneness

Author: Antoniol Giuliano
Arnaoudova Venera
Eshkevari Laleh
Guéhéneuc Yann-Gaël
Oliveto Rocco
Publication venue
Publication date: 01/08/2010
Field of study

Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the physical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the conceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others

PolyPublie